An Efficiently Computable Support Measure for Frequent Subgraph Pattern Mining

نویسندگان

  • Yuyi Wang
  • Jan Ramon
چکیده

Frequent subgraph pattern mining is an important graph mining task. While the majority of existing systems focus on the transactional setting, where every transaction is a separate, independent graph, in recent years there has been an increasing interest in mining large networks. In particular, given a network (also called database graph) D, a pattern language L, a frequency measure freq and a minimal frequency threshold minsup, the task of frequent pattern mining is to list all patterns P ∈ L such that freq(D,P ) ≥ minsup. Unfortunately, in this problem formulation choosing a good frequency measure freq has shown to be challenging [1, 2, 3, 4]. Ideally, a frequency measure (to measure the number of occurrences of a pattern P in a graph D) is a function satisfying the following properties: 1. Anti-monotonic the support of a pattern should not be larger than the support of any subpatterns. Therefore, we cannot just use the number of images (an image is a subgraph of the database graph, and it is isomorphisc to the pattern) of a pattern as its support. 2. Normalized if for every pattern which has only independent images in a database graph, its support in that database graph equals the number of images. Independent images mean that they do not overlap according to some notion of overlap, such as sharing a vertex or an edge. In this paper 1, we use vertexoverlap. 3. Statistical soundness the function should give a measure of the number of independent observations of a phenomenon (the pattern). An important class of anti-monotonic normalized support measures relies on overlap graphs. Given a database graph D and a subgraph pattern P , the vertices in the overlap graph GP are the images of P in D, and two vertices are adjacent iff the images overlap in D (here, non-overlap is used as an approximation of statistical independence). An overlap graph based support measure (OGSM) takes an overlap graph of a pattern in a database graph as its input, and outputs the support of that pattern in that database graph. Vanetik et al. [1] proposed the first OGSM, the size of the maximum independent set (MIS) of the overlap graph. Unfortunately, computing the MIS of an overlap graph is NP-hard, and it has been shown that MIS cannot be approximated even within a factor of n1−o(1) efficiently [5], where n is the order of the overlap graph. Calders et al. [4] proposed the Lovász θ value (see, e.g., [6, 7]), which is computable in time polynomial in the order of the overlap graph using semidefinite programming (SDP). A straightforward application of a general purpose SDP solver yields a running time of O(n). Approximation algorithms exist but even these approximative methods are still computationally too expensive for our purposes. In this paper, we propose a new support measure s which is a solution to a (usually sparse) linear program. As we are using vertex-overlap, each vertex v in a database graph D determines a clique in the overlap graph GP in which P is a pattern. That is, the images which share the vertex v build a clique in G D P . Based on this observation, we introduce the overlap hypergraph H P whose vertices are the images of P in D, and its hyperedges are these cliques. The s measure is an overlap hypergraph based support measure (OHSM). In order to define the s measure, a vector x indexed by the vertices of H P is assigned to the H D P , i.e., the variable xv is assigned to the vertex v. The linear program of the s measure is,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Frequent Neighborhood Patterns in Large Labeled Graphs

Over the years, frequent subgraphs have been an important sort of targeted patterns in the pattern mining literatures, where most works deal with databases holding a number of graph transactions, e.g., chemical structures of compounds. These methods rely heavily on the downwardclosure property (DCP) of the support measure to ensure an efficient pruning of the candidate patterns. When switching ...

متن کامل

Efficient Frequent Connected Induced Subgraph Mining in Graphs of Bounded Tree-Width

We study the frequent connected induced subgraph mining problem, i.e., the problem of listing all connected graphs that are induced subgraph isomorphic to a given number of transaction graphs. We first show that this problem cannot be solved for arbitrary transaction graphs in output polynomial time (if P 6= NP) and then prove that for graphs of bounded tree-width, frequent connected induced su...

متن کامل

Mining Frequent Graph Sequence Patterns Induced by Vertices

The mining of a complete set of frequent subgraphs from labeled graph data has been studied extensively. Furthermore, much attention has recently been paid to frequent pattern mining from graph sequences (dynamic graphs or evolving graphs). In this paper, we define a novel class of subgraph subsequence called an “induced subgraph subsequence” to enable efficient mining of a complete set of freq...

متن کامل

FP-GraphMiner-A Fast Frequent Pattern Mining Algorithm for Network Graphs

In recent years, graph representations have been used extensively for modelling complicated structural information, such as circuits, images, molecular structures, biological networks, weblogs, XML documents and so on. As a result, frequent subgraph mining has become an important subfield of graph mining. This paper presents a novel Frequent Pattern Graph Mining algorithm, FP-GraphMiner, that c...

متن کامل

Vertex unique labelled subgraph mining

With the successful development of efficient algorithms for Frequent Subgraph Mining (FSM), this paper extends the scope of subgraph mining by proposing Vertex Unique labelled Subgraph Mining (VULSM). VULSM has a focus on the local properties of a graph and does not require external parameters such as the support threshold used in frequent pattern mining. There are many applications where the m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012